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Abstract 

Concentration inequalities are fundamental tools in probabilistic combinatorics and the- 
oretical computer science for proving that random functions are near their means. Of par- 
ticular importance is the case where f(X) is a function of independent random variables 
X = (X\, . . . , X n ). Here the well known bounded differences inequality (also called McDi- 
armid's or Hoeffding-Azuma inequality) establishes sharp concentration if the function / does 
not depend too much on any of the variables. One attractive feature is that it relies on a 
very simple Lipschitz condition (L): it suffices to show that \f(X) — f(X')\ < Ck whenever 
X, X' differ only in Xk ■ While this is easy to check, the main disadvantage is that it considers 
worst-case changes Cfc, which often makes the resulting bounds too weak to be useful. 

In this paper we prove a variant of the bounded differences inequality which can be used to 
establish concentration of functions f{X) where (i) the typical changes are small although (ii) 
the worst case changes might be very large. One key aspect of this inequality is that it relies 
on a simple condition that (a) is easy to check and (b) coincides with heuristic considerations 
why concentration should hold. Indeed, given an event F that holds with very high probability, 
we essentially relax the Lipschitz condition (L) to situations where F occurs. The point is that 
the resulting typical changes Ck are often much smaller than the worst case ones. 

To illustrate its application we consider the reverse //-free process, where H is 2-balanced. 
We prove that the final number of edges in this process is concentrated, and also determine its 
likely value up to constant factors. This answers a question of Bollobas and Erdos. 

1 Introduction 

In probabilistic combinatorics and theoretical computer science it is often crucial to predict the 
likely value(s) of a random function. More precisely, in many applications f(X) is a function of 
independent random variables X — {X\, . . . , Xjy), and we need to prove that it is concentrated in 
a narrow range around its expected value, i.e., that f(X) typically is about fi = Ef(X). The crux 
is that the functions of interest are often defined in an indirect or complicated way, so that basic 
bounds such as Chebychev's inequality are either hard to evaluate or give error bounds that are too 
weak in applications. In this work we thus investigate easy-to-check conditions which ensure that 
the function /(X) is close to its mean /i with very high probability, i.e., that large deviations from 
jjL are highly unlikely. 
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An important paradigm in this area of research (see e.g. [41] ) states that a random function 
which depends 'smoothly' on many independent random variables should be sharply concentrated, 
meaning that \f{X) — [i\ = o(/z) holds with probability very close to one. In many applications (e.g. 
the design of randomized algorithms or random graph theory) each random variable X k takes values 
in a set A*, and in this case a discrete Lipschitz condition for / : n^eL/v] Ai ~* ^ conveniently ensures 
that f{X) does not depend too much on any of the variables, where [N] = {1, . . . , N}. Perhaps the 
most famous result in this context is the bounded differences inequality (also called McDiarmid's or 
Hocffding-Azuma inequality) , which is nowadays widely used in discrete mathematics and computer 
science, see e.g. the surveys [281 |2"5] . Here we only state its one-sided version since the analogous 
lower tail estimate V(f(X) < fi — t) follows by considering the function —f{X). 

Theorem 1 ('Bounded differences inequality'). [35] Let X = (Xi, . . . , Xn) be a family of indepen- 
dent random variables with Xk taking values in a set A&. Assume that the function f : Ylj^m] A? ~ ^ 
R satisfies the following Lipschitz condition: 

(L) There are numbers (ck)k£[N] such that whenever x, x G Iljenvi differ only in the k-th 
coordinate we have 

\f(x)-f(x)\<c k . (1) 

Let [i = ~Ef(X). For any t > we have 

P(/(X)>/i + t)<exp(- ^ 2 ). (2) 
\ L,ke[N] c kJ 

While the simplicity of (L) makes this inequality very intuitive and easy to apply, its perhaps 
main drawback is that it considers worst case changes. In particular, the resulting concentration 
bounds are rather weak (or even trivial) in situations where the worst case c k are much larger than 
the typical changes. A standard example is f{X) counting the number of triangles in the binomial 
random graph G UtP : since every pair of vertices has up to n — 2 common neighbours the worst case 
is Cfc = 0(n), which is much larger than we expect from the Q(np 2 ) common neighbours we usually 
have for p > n~ 1 / 2+e . In fact, here Theorem Q] only gives trivial estimates for p = C^n -1 / 3 ), but it 
seems plausible that concentration should hold in such applications where the typical changes are 
much smaller than the worst case ones. 

This motivated a line of research [24] [35] 136] 021 [43] which focused on tail inequalities of the 
form F(\f(X) — fj,\ > t) < e~ 9 ^ ,x,t ^ in situations where (intuitively speaking) the average Lipschitz 
coefficients Ck are small. Pioneered by Kim and Vu [M], such results usually require f(X) to have a 
special structure (a polynomial of independent random variables of a certain type), reducing their 
range of applications compared to ([2]). Furthermore, the assumptions of such techniques are much 
more involved (and harder to check) than the simple Lipschitz condition (L). 

In contrast, much less research has been devoted to developing easy-to-use tools for proving con- 
centration results in such situations. The Hoeffding-Azuma inequality [201 13] implies, for example, 
that © essentially remains true if we relax (TTJ) to worst case conditional expected changes: 

|E(/(X) \Xi,...,X k )- E(f(X) \Xi,..., AVx)| < c k . (3) 

While this might be useful in certain textbook examples, it typically has two main drawbacks in 
involved combinatorial applications: (a) conditional expectations are usually difficult to calculate 
and (b) it often yields no substantial improvement (for, say, k > N/2 the worst case in ([3]) over all 
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choices of X\ , . . . , Xk is often comparable to ([TJ ) . There are also some approaches which allow 
to be violated occasionally [13l[T4j[23l[29l|37], but these usually require knowledge about conditional 
probability distributions, making them particularly difficult to apply when f(X) is defined in an 
indirect or complicated way. 



1.1 Typical bounded differences inequality 

In this paper we develop a variant of the bounded differences inequality which can be used to 
establish concentration of functions f(X) where (i) the typical changes are small although (ii) the 
worst case changes might be very large. One key aspect of this inequality is that it relies on a simple 
and attractive condition that (a) is easy to check and (b) coincides with heuristic considerations why 
concentration should hold. Indeed, given a 'good' event T that holds with very high probability, we 
essentially relax the Lipschitz condition (L) to situations where T occurs. More precisely, for the 
sake of proving concentration the following inequality usually allows us to restrict our attention to 
such typical changes, which are often much smaller than the worst case ones. 

Theorem 2 ('Typical bounded differences inequality'). Let X = (Xi, . . . , Xn) be a family of 
independent random variables with Xk taking values in a set A&. Let T C J^^gj^r] Aj be an event 
and assume that the function f : Y[je[N] A7 ® satisfies the following typical Lipschitz condition: 

(TL) There are numbers {ck)k£[N} an d [d k )k£[N] with Ck < dk such that whenever x, x G Oje[AT] Aj 
differ only in the k-th coordinate we have 

\m-m\ < I J l { x h er ' (4) 

I dk otherwise. 

For any numbers (^/k)ke[N] with 7^ G (0, 1] there is an event B — B(T, (7fc)fcg[JV]) satisfying 

p(b) < ik 1 ■ H x i r ) and ^ B c r, (5) 

ke[N] 

such that for /1 = Kf(X), ek = Jk(dk ~ Ck) and any t > we have 

F(f(X)>» + t and^B)<exp(--= j ). (6) 

Remark 3. If each Xk takes only two values (i.e., when |Afc| = 2) the exponent in ([6]) may be 
multiplied by factor of 4, analogous to the standard bound ([2]) . 

As before, this inequality is only stated for the upper tail since an application to —f(X) yields 
the same estimate for P(f(X) < \i — t). One key property of the 'bad' event B is that it does not 
depend on the function f{X), so that ^ can be used as a tail estimate in union bound arguments. 
We expect that the typical changes Ck are usually substantially smaller than the worst case dk , and 
the 'compensation factor' 7^ is supposed to milden the effects of the dk in ©■ Indeed, in the typical 
application jk will be very small (this choice is possible if T holds with very high probability) , so 
that we can think of = "fk{dk - ct) as a negligible 'error term'. With this in mind, perhaps the 
most important aspect of Theorem [5] is that it may still yield concentration in situations where 
Theorem [1] only gives trivial bounds due to very large worst case Lipschitz coefficients. 
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To illustrate the ease of application of Theorem [21 consider again the example where f(X) 
counts the number of triangles in G n>p . Define T as the event that every pair of vertices has at 
most A = max{2np 2 ,n e } common neighbours, which fails with probability at most e~ n ( n ' by 
standard Chernoff bounds. It is straightforward to see that in this case (TL) holds with, say, 
Cfc = A and du = n. Setting 7^ = rT 1 we thus have e& = o(cfe) and P(£>) < e~ n ( n '\ which 
means that both terms are negligible for the sake of establishing concentration. It follows that for 
p > n~ 2 / 3+e the typical bounded differences inequality (Theorem [2]) yields tight concentration of 
the number of triangles (with P(/(X) ^ (1 ± n~ e )(i) < e~ n( " e ), say), whereas Theorem [T] already 
fails for p — n^ 1 ^ 3 . Note that we picked all parameters in a uniform way, setting c k — C , d k = D 
and 7^. =7; this might be convenient in many applications (where 7 ~ C/D should often suffice). 

1.1.1 Improvement for Bernoulli random variables 

If the underlying probability space is generated by independent Bernoulli random variables we 
establish much stronger estimates. For example, in the common situation where the success proba- 
bilities are all equal to p (as in G n ,p) the following natural extension of Theorem [2] essentially allows 
us to multiply the denominator of ([6]) with an extra factor of p (on an intuitive level one can perhaps 
think of this as applying Theorem [5] after conditioning on Q(Np) variables being 'relevant'). 

Theorem 4 ('Typical bounded differences inequality for 0-1 variables'). Let X = (X±, . . . , Xm) be 
a family of independent random variables with X k G {0, 1} andpk = P(-X& = 1). Let T C {0, 1} N be 
an event and assume that the function f : {0, 1}^ — > R satisfies the typical Lipschitz condition (TL) 
with Afe = {0, 1}. For any numbers {jk)ke[N] with jk G (0, 1] there is an event B = B(T, (■jk)k£[N]) 
satisfying ([5]) such that for /i = Kf(X) and any t > we have 

F(f(X)>» + tand^B)<exp(--= *" WToFhN ) ' ( 7 ) 

\ 2 Lfce[jv]( 1 -Pk)Pk(Ck +e k ) z + 2Ct/3 J 

where e k = lk{dk ~ c k ) and C = max fce [ Ar ](c fc + e k ). 

Remark 5. If f{X) and V are either both monotone increasing or decreasing we have 

In typical applications of this inequality we hope to be able to ignore the 'error term' 2Ct/3 
(and select 7^ such that Ck + &k ~ Ck, as before). In this case ([7]) is close e~* ^ 2 5DP fcC fc), which 
for pi~ = o(l) is a significant improvement of the corresponding e~ 2t /^ c ^ from Remark [31 For 
example, in the case of triangles in G„ iP this allows us to extend the concentration result of the 
previous section to edge probabilities satisfying p > ri~ 4 / 5+£ . In fact, the estimates implied by §8§ 
are sometimes comparable to those of Janson's inequality [23132]) see Section [1.2.21 

Ignoring the 'good' event T in Theorem [4] we also obtain a strengthening of Theorem [TJ Since 
this natural variant of the bounded differences inequality docs not seem to be as widely known, we 
explicitly state it for ease of reference (if each (1 — pk)pk is weakened to maxi min{l — p%,Pi} then 
([9]) follows from Theorem 3.9 in McDiarmid's survey [29]; Alon, Kim and Spencer [2] also proved 
a comparable inequality that applies to small values of t only: for those the contribution of Ct to 
the denominator of ([9]) is negligible). 
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Corollary 6 ('Bounded differences inequality for 0-1 variables') . Let X = (X\ ,Xn) be a family 
of independent random variables with X k £ {0, 1} and pk = P(-Xfc = !)• Assume that the function 
f : {0,1}^ — > R satisfies the Lipschitz condition (L) with Afe = {0,1}. Let [i = E/(X) and 
C = maxfe e [jv] Cfc. For any t > we have 

nf(X)>n + t)<expl-— T ——). (9) 

Proof. Apply Theorem H with V = {0, 1}^ and d k = c k . □ 

This extends Bernstein's inequality (a strengthening of the Chernoff bounds for small deviations, 
see e.g. Remark 2.9 in [22]), which applies to sums of independent random variables. One key aspect 
of ([9]) is that it is almost tight when f{X) = Y^k-^k, in which case V = Var f(X) — J2kP k (^ ~ Pk) 
and Cfc = 1. Indeed, the estimate of Corollary [5] is then close to e~* /( 2V ) for t not too large, which 
is exactly the tail behaviour predicted by the central limit theorem. 

Remark 7. Our arguments in fact yield a slightly stronger form of ([T])-©, analogous to Bennet's 
sharpening of the Chernoff bounds (see e.g. Remark 2.9 in 1221). Indeed, for 4>{x) = (1 +x) log(l + 
x) — x we can improve terms of the form e - * /( 2V + 2Ct / 3 ) f e ~ v / c -<t>(Ct/V) ^ w \ lere y equals Y] k (l — 
Pk)Pk(ck + ek) 2 o,ndJ2k(^~Pk)PkC k in ([7]) and ©. Fort = u>(V/C) these refined estimates sharpen 
the exponents from order 0(i/C) to Q(t/C ■ \og(Ct/V)), i.e., yield a logarithmic improvement. 

Remark 8. Theorem [7] and Corollary [5| extend with minor modifications to the case where each 
X k takes values in a set A k and satisfies max^gA/, P(A'fe = rj) > 1 — p k - Indeed, ([7]) and @ both 
hold after deleting (1 — Pk) and replacing c k + e k with Ck + e k ■ (1 — p k ) _1 ■ 



1.1.2 Two-sided Lipschitz conditions 

The typical Lipschitz condition (TL) is 'one-sided': |/(x) — f{x)\ < c k is supposed to hold if x G T. 
This keeps the formulas simple, but in many applications it is easier (and perhaps more natural) 
to verify a 'two-sided' condition where x, x £ T holds. The following theorem states that we may 
use a two-sided variant of (TL) at the cost of slightly increasing the 'error term' e k . 

Theorem 9 ('Two-sided typical Lipschitz condition'). Theorems^ and Remarks^ [?J [21 remain 
valid with e k — 2 r y k {d k — Ck)<lk~ and min^gAfc = if} > Ik if the Lipschitz condition Q of (TL) 

is replaced by the following two-sided variant: 

\m-m\ < h ^ ier > (io) 

I a k otherwise. 

Whenever q^ 1 is not too big (flQ|) seems the most convenient condition: it is much simpler to 
check than and does not substantially deteriorate the error bounds. For example, in the random 
graph G np we usually have q^ 1 < n 2 , which in the typical application with F(X £ F) < n^^^ can 
be compensated by adapting 7& accordingly (also note that (^—pkjPkq^ 1 < 1 in case of Theorem [4]). 

Remark 10. As pointed out by Oliver Riordan it is possible to bootstrap (|10p from (J4J) by modifying 
the good event. Indeed, defining V C F such that for x £ V any single coordinate change results in 
a sample point satisfying x £ T, it follows that the one-sided condition x £ V implies the two-sided 
condition x, x £ V. Using the bound F(X ^ T') < Y^ke[N] % 1 ' V) this approach often leads 

to estimates that are comparable with Theorem^ (in fact, monotonicity ofT also transfers to V). 
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In some applications (a) the q k are very small and (b) exploiting igT when bounding \f(x) — 
f(x)\ is difficult, in which case neither the two-sided (fTUj) nor the one-sided ([4]) seem to be suitable. In 
an attempt to deal with such situations wc introduce an intermediate variant, which is 'locally' two- 
sided: it only requires the (one-sided) typical Lipschitz condition (j4|) to hold when each coordinate 
of both sample points x,x satisfies some local 'good' event Xj,Xj £ Tj. 

Theorem 11 ('Typical bounded differences inequality with truncation'). Let X = (Xi, . . . , Xjy) be 
a family of independent random variables with X k taking values in a set A k - Suppose (Tk)ke[N] an d 
T C JX^nyi Tj are events with T k C Aj. Assume that the function f : Ilje[7V] Ai ~ ^ ^ satisfies the 
Lipschitz condition Q of (TL) only for all x,x £ YijeW] that differ only in the k-th coordinate, 
and that \ f(x) — f(x)\ < s for all x, x £ Ylje[N]^-3' For any numbers (7fc)fce[W] with jk € (0,1] 
there is an event B = B(T, (~fk)k£[N]) satisfying ([5]) such that for /i — 'Ef(X), A = sV(X ^ T), 
Ck = lk{dk — Cfe) and any t > we have 

P(/(X)>/x + t + A and^B)<expl-— j ). (11) 

Remark 12. IfT = Y[j<e[n] Tj holds we may set B = -iT, eu — and multiply the exponent of (|11[) 
by a factor of A. If certain monotonicity properties hold we can remove the A term: for example, 
we can set A = if f(x) > f{x) whenever x, x € Iljgfjv] -^i differ only in their k-th coordinates 
Xk € A fc \ T k and x k £ T k . 

Theorem [TT] seems particularly useful when the underlying random variables are grouped into 
larger blocks B k , so that each X' k now takes values in its own product space A' fc = JljeSfc A) (by 
construction the X' k are again independent). For example, the so-called 'vertex exposure' of G UiP 
uses n — 1 blocks, where X' k corresponds to the group of edges E k = (v k v k +i, ■ ■ ■ , v k v n ). In this 
case q k < p n ~ k and the 'good' event T of, say, having at most S = max{2np, n £ } neighbours can 
dramatically fail after changing the k-th coordinate (the degree of v k can change up to n — k). Here 
we can overcome these issues using the 'local' event T k that at most E edges of E k are present, 
so that after a one-coordinate change of x £ Yi j Tj fr° m x k to x k € I\. the degree of every vertex 
changes by at most E (if x £ T then every vertex has at most 2E neighbours). In other words, the 
local Tfc and global T can complement each other in order to milden the large worst case effects (in 
particular when many variables are associated with each coordinate). 

Theorem 1111 also allows us to routinely apply certain truncation arguments (without ad-hoc 
calculations). A typical example is f{X) — ^2 k X k with X k having exponential tails, where one 
often first proves concentration of, say, min-fA^, Clog N}, and then transfers this result to the 
original sum, see e.g. pQ |T2] . Here (fTTj) almost immediately yields concentration of f{X) via the 
local events T k that X k < Clog N occurs (setting T = T\- Tj, d k = c k = Clog N and j k = !)• 

1.1.3 Dynamic exposure of the variables 

The previous inequalities can be refined by exposing the values of the random variables Xi one 
by one in an adaptive order. Intuitively this allows us to exploit that after having learned the 
values of certain variables, some other Xj may not any more influence the value of f{X). This 
approach was introduced by Alon, Kim and Spencer [2], and is particularly useful whenever we 
can determine f{X) without knowing the value of all random variables. More formally, a strategy 
sequentially exposes X qi , X q2 , . . ., where each index qi = qi(X qi , . . . , X qi _ x ) may depend on the 
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previous outcomes and indices (we use the convention that qu+i = <Zfc if f(X) is determined by 
(X qi , . . . , X qk ) with k < N) ; every strategy has a natural representation in form of a decision tree. 
With a fixed strategy in mind, for every possible outcome X — {X\, . . . ,Xn) we obtain a set of 
queried indices Q C [N], and by Q we denote the set of all possible such query sets Q. The resulting 
key improvement is that in most inequalities we essentially may replace k £ [N] with k £ Q for 
some 'worst case' set of indices Q £ Q (note that 7^ = 7 is a typical choice in applications). 

Theorem 13 ('Dynamic exposure of the variables'). Suppose that 7^ = 7 for all k £ [N]. For any 
strategy Theorems [3 1^1 Corollary and Remarks [?| remain valid with X)fce[./v] 
replaced by rnaxQ e g^ fcg g and maxfe e [jv] replaced by maxQ 6 g max^gg, i/ie addition that B 
depends on the query strategy. 

Theorem 14 ('Monotone dynamic exposure of the variables'). Consider any strategy satisfying 
q t+1 > q t in each step. Then Theorems E2J Corollary and Remarks El 
remain valid with J2k£[N] replaced by rnaxQ e g^ feg Q and max^^r] replaced by maxQ 6 g max^gQ, 
wit/i t/ie exception that ((5|) remains unchanged. 

Applied to Corollary Theorem and Remark these results tighten and extend an inequality 
of Alon, Kim and Spencer [2], which is based on the Lipschitz condition (L). In certain applications 
dynamic exposure yields significant improvements, and for an illustrating example we refer to 
Claim 2 in [2], where it is crucial to reduce (the order of magnitude of) the number of queried 
variables. Further refinements are possible by using adaptive Lipschitz bounds which is perhaps 
most easily exploited by tailoring the arguments of Section to the specific application. 

One key feature of Theorem [Til is that the 'bad' event B does not depend on the strategy used, 
making it particularly useful in union bound arguments. As an illustration consider the example 
where f(X) = fu(X) counts, in G n . p , the number of triangles in a subset U C V of the vertices. 
Define T as in Section 11.11 Since f(X) depends only on edges in U, using Theorems and [T4l 
(sequentially exposing all edges in U) we infer for \U\ = u > uq = uo(n,p) and A = min{A, u} that 

Hf(X) 4 (l±n- £ )u and -if?) < exp ( -O ( J\) < 

Taking a union bound the probability that some U C V with \U\ > uq has the 'wrong' number 
of triangles is at most P(f?) + n~ w( ^ Uo ^. Here we crucially exploited that B is a 'global' event not 
depending on f(X) or U, so that P(f?) < e~ nt > n ^ does not need to 'compete' with the n u choices for 
the subsets (this issue often makes traditional bad events ineffective in union bound arguments). 

1.1.4 Weakening the independence assumption 

The concentration results discussed so far extend to certain dependent random variables X = 
(Xi, . . . , Xn) that are generated by a sequence of 'nearly' independent (or uniform) random choices. 
As we shall see, they e.g. apply to random permutations it £ S n and uniform random graphs G n . m . 
To motivate the new (GL) condition below we consider independent random variables, in which 
case the mapping pk : S a — > that changes the value of the k-th coordinate from a to 6 is a 
bijection. Here (L) yields \f(x) — f{pk{x))\ < Ck and independence implies F(X — x \ X £ E Q ) = 
F(X = pk{x) I X £ S;,). With this in mind (TT21) can be viewed as a natural analogue of (0) in which 
the outcomes x and x — Pk{x) may differ in more than just one coordinate, and (|13[) accounts for 
the fact that the variables are not necessarily independent. 
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Theorem 15 ('General bounded differences inequality'). Let X = (X\, . . . , Xj?) be a family of 
random variables with Xk taking values in a set Afc. Let T C nye[iV] As ^ e an even t- Then the 
conclusions of Theorem^ remain valid if instead of (TL) the function f : Y[j^[N] A? ^ satisfies 
the following general Lipschitz condition: 

(GL) There are numbers (ck)ke[N] an d {dk)kz[N] with cj. < d k such that the following holds for any 
two possible sequences of outcomes a%, . . . , , ak—ii a an d «i, ■ • ■ , a-k— it of X\, . . . , X k . Defining 

E z = |a:= (ai,...,a k -i,z,x k +i,...,x N ) G Aj : P(X = x) > oj, 

je[N] 

there is an injection pk — pfc(E a , E;,) : E a — > E;, such that for all x G E a we have 

\f(x)~f(p k (x))\ < {J %fxeT ' and (12) 
I ctfe otherwise. 

P(X = x | X g E„) < P(X = p k (x) | X G E 6 ). (13) 

Remark 16. T/ie proof shows that p k must be a bisection with equality in (|13[) . Furthermore, if X k 
takes at most two values conditioned on X\, . . . ,Xk-i, then the exponent in ^ may be multiplied 
by factor of A. Ln fact, ([2]) holds if V — Y\.je[N}^-3 ( or r k — below). In addition, for ([6]) to hold 
with e k — Jkrk > it suffices if we relax (GL) to the average Lipschitz condition 

\E(f(X)\X G E ) - E(f(X) | X G E b )| < c fc + r k F(X £ V \ X G E a ). (14) 

To illustrate the application of the (GL) condition we consider uniform permutations n G S n , 
which are generated by sequentially choosing each ir(k) randomly from [n] \ {7r(l), . . . , 7r(fc — 1)}. 
Here E 2 contains all tt with 7r(fc) = z and = aj for 1 < j < k. In this case a bijection 
Pk ■ E a — > E(, is defined by the transposition of a and b, so that 7r' = Pk(^) satisfies n'(k) = b, 
7r'(7r _1 (6)) = a and 7r'(i) = Tr(i) for 7r(i) ^ {a,b}. Using |E a | = |E;,| = (n — k)\ and the uniform 
measure it is not hard to check that (fT3| holds with equality. We see that for establishing fj 12|) 
it suffices to bound \f(n) — /(tt')I whenever 7r and tt' are related via a transposition, which is an 
intuitive and easy to check condition (this may correspond to changing two coordinates). 

One key aspect of (GL) is that it often maintains the simplicity of (L) and (TL). Here uniform 
probability measures are particularly convenient, for which it suffices to first define bijections pk '■ 
E a E b and then check dHJ) only (using ¥(X = x \ X G E ) = P(X = x)/P{X G E a ) these must 
satisfy ([TB^) with equality). Indeed, extending the permutations example, for random sequences 
T = (ii, . . . ,t m ) of m distinct elements from W it is enough to estimate \ f{T) — f(T')\ whenever 
both sequences are related by changing one coordinate (i.e., t k ^ t' k ) or interchanging the order 
of two coordinates (i.e., tk = t'j and t,j = t' k ). Note that this example includes the random graph 
process and various hypergraph processes as special cases. Since every set with m elements gives 
rise to m! ordered sequences, the above result also readily carries over to uniform random subsets 
S C W of size |5| = m: it suffices to bound \f(S) — f(S') \ whenever the sets are minimally different, 
i.e., satisfy \S fl «S"| = m — 1 (note that for m > \W\/2 better results are obtained by choosing the 
complement uniformly at random). Here the uniform random graph G n _ rn and uniform hypergraphs 
are special cases. Note that the above construction also extends to multiple (independent) random 
objects; for example, if M random subsets X = (Si, . . . , 5^/) with S{ G Wi and \Si\ = are chosen 
independently it suffices to consider \f(X) — f(X') \ only for the cases where X and X' are minimally 
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different in one set. Finally, with similar easy-to-check conditions (GL) also applies, for example, 
to finite metric spaces, perfect matchings and the configuration model G^, see e.g. [30l l46l [9], 

Several extensions of Theorem [2] carry over to Theorem 1151 with some minor modifications, and 
results analogous to those of Sections 11.1 . II and 11.1.21 including a two-sided Lipschitz condition, are 
stated below (Remark [7] also applies to (fT"5j) after adjusting V accordingly). 

Theorem 17 ('General bounded differences inequality for asymmetric variables'). Let X = (Xi, . . . ,Xj?) 
be a family of random variables with Xk taking values in a set Afc, where max^gA/, P(-Xfc = V I 
Xx,...,Xk-i) > 1 — pk holds. Let T C IX/e[Ar] -^j be an event and assume that the function 
f : Y[j£[N] Aj ~^ ^ satisfies the general Lipschitz condition (GL). For any numbers (7fc)fc6[JV] with 
7fc G (0, 1] there is an event B = B(T, {'Jk)ke[N]) satisfying ([5]) such that for p = E/(A) and any 
t > we have 

P(/(X) > a + t and ->B) < exp | 5 |, (15) 

V 2J2 ke[N] pk(c k + ek-(l-Pk)- 1 f + 2Ct/3j 

where e k = 7fe(d fc - c fe ) and C = max fce [ A r](c fc + e k ). 

Theorem 18 ('Two-sided general Lipschitz condition'). Theorems Q21 [7^ and Remark[TSl remain 
valid with ek = 2jk{dk — c^q^ 1 and mm v& \ k ¥(Xk — r\ \ X\, . . . > qk if the Lipschitz 

condition (fT2j) of ( GL ) is replaced by the following two-sided variant: 

iff \ fi t~\\\ s ) Ck x iPk(x) £T, 

\f(x) - f[Pk{x))\ < < . (16) 

I dk otherwise. 

In addition, qk < |Afe| _1 suffices when all possible outcomes occur with the same probability. 

The sufficient condition qk < |Afc| _1 often makes the two-sided Lipschitz condition of TheoremUHl 
easy to apply. For example, in case of random permutations 7T G S n and random graphs G n , m (or 
the random graph process) we may take qk — n~ x and qk = n~ 2 , respectively. 

1.2 Discussion and applications 
1.2.1 A wider perspective 

As discussed, in probabilistic combinatorics and the analysis of randomized algorithms we frequently 
need to prove that a random function is not too far from its mean, e.g., that f(X) « [i or f(X) < 1\i 
holds. A common feature of many recent applications is that the functions of interest are only 
'smooth enough' on a high probability event, whereas their deterministic worst case changes are too 
large for the standard bounded differences inequality (Theorem [T|) to be effective. 

In these cases there is no general method, but in the past certain ad-hoc arguments have 
been successfully used in such situations (see e.g. QUI [25l [26l [33] )• Usually the key idea is to 
construct a random function g(X) that is a smooth approximation of /(A), which in particular 
by definition ensures that the Lipschitz coefficients are always small (approximation usually means 
that f(X) sa g(X) holds with high probability, but often E/(A) « Kg(X) is also needed). Here 
smoothness makes it possible to apply concentration inequalities (often the bounded differences 
inequality) to g(X), whereas the approximation property ensures that concentration transfers from 
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g(X) to f(X). The main disadvantage of this approach is that it relies on ad-hoc arguments (which 
can be involved); in particular, finding suitable approximation functions may require ingenuity. 

One aim of this paper is to provide easy-to-apply tools which can routinely deal with such 
situations, establishing concentration in a rather simple way. For example, in the frequent case 
where the good event T holds with probability at least 1 — TV - " 1 - 1 ' we can typically choose y^ 1 = 
max |/(A)| and then completely ignore the worst case effects, see e.g. the proof of Theorem [55] (this 
approach also applies, for example, to Lemma 14 in |33j and parts of the martingale-based proof of 
Theorem 2.2 in [26] ). In other words, the crucial advantage of our new inequalities is that they can 
often remove the need for sometimes difficult ad-hoc arguments using only a minimum amount of 
calculations (which typically even coincide with heuristic considerations). 



1.2.2 Comparison with Janson's inequality 

In this section we demonstrate that in certain applications our inequalities give exponential esti- 
mates that (i) are tight and (ii) successfully compete with the well known Janson's inequality. To 
this end we focus on subgraph counts in the binomial random graph G„ iP since a concrete example 
seems more illustrative to us. Henceforth we assume that H is a fixed 2-balanced graph, i.e., where 
H has en > 2 edges and all its proper subgraphs G C H with vq > 3 vertices satisfy 

e -^< e -^=d 2 (H). (17) 

v G - 2 v H - 2 

This class of graphs includes, for example, complete graphs and cycles of arbitrary size. Let Yh 
count the number of H copies in G UtP . For 2-balanced graphs it is well-known (see e.g. [22]) that 
Janson's inequality gives 

P( y H <,-t)<exp(-e(^ y )) (18) 

forp > n~ x /fa( H ). Spencer [38] proved that, assuming p > n~ 1 '' i ^ H > (log n) b with <b = b(H) < 2, 
for every c > the following holds with probability at least 1 — nT c for n > no(c, H): every pair 
xy of vertices is contained in at most A = 0(n vll ~ 2 p eH ) extensions to copies of H (for which 
adding the edge xy completes a copy of H containing xy). The latter event will be our decreasing T, 
which allows us to use Ck = Q(n VH ~ 2 p eH _1 ) = 0(n/(n 2 p)) as well as = n VH and jk — n~ VH in 
our typical bounded differences inequality, so that = jk(dk — Ck) = o(cfc). Applying Spencer's 
result with c = vh + 3 we have P(£>) < by ([S]). Note that, since for the lower tail we have 
t = O(fi), it follows that J2kP c k + * niax^ Cfc = 6(/U 2 /(n 2 p)). For the decreasing function / = — Yh 
a combination of flSJ and (J7J now yields 

t 2 



\Y H < n - t) < cxp -9 



s /i 2 j(n 2 p) ( 

which asymptotically matches (|18p . i.e., the estimate of Janson's inequality. In fact, for t = 0(/i) 
this bound is best possible (up to constants in the exponent) since G njP contains no edges (and 
thus no copies of H) with probability e~ 0(n p \ 



1.2.3 Application: the reverse H-free process 

The following variations of the classical random graph processes were proposed by Bollobas and 
Erdos at the 1990 Quo Vadis, Graph Theory conference in an attempt to improve Ramsey num- 
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bers [51 111). The H-free process, where, starting with an empty graph on n vertices, in each step 
a new edge is added, chosen uniformly at random from all pairs whose addition does not complete 
a copy of H. The reverse H-free process, where, starting with a complete graph on n vertices, in 
each step an edge is removed, chosen uniformly at random from all edges that are contained in a 
copy of H. The H -removal process, where, starting with a complete graph on n vertices, in each 
step all en edges of a copy of H are removed, which is selected uniformly at random from all H 
copies. All of these processes end with an -H-free graph, and Bollobas and Erdos asked (among 
other structural properties) what their typical final number of edges is [5J [TT] . 

These variations have received considerable attention in recent years, in particular the H-free 
process. Its typical final number of edges is nowadays known up to logarithmic factors [SI] for 
the class of strictly 2-balanced graphs H , where in (j!7[) the inequality is strict. Matching bounds 
up to constant factors have only been established for some special forbidden graphs and the class 
of Cf-free processes, see e.g. [44l [45] ■ The final graph of the K s -hee process also yields the best 
known lower bounds on the Ramsey numbers R(s,t) with s > 4, see (4] [6]. Recently Makai [27] 
determined the (asymptotic) final number of edges of the reverse H-iree process for the class of 
strictly 2-balanced graphs, but its final graph yields no new estimates for R(s,t). Although the 
related ii-removal process has been studied in several papers the final number of edges is known 
up to multiplicative n°W factors only in the special case H = K 3 , see e.g. [5l l34l [39] . 

Using our typical bounded differences inequality, in Section [3] we show that the final number 
of edges in the reverse H-iree process is sharply concentrated when H is 2-balanced (we do not 
assume strictly 2-balanced), and also determine the likely number of edges up to constants. This is 
in contrast to all known results for the widely studied H-fiee and iJ-removal processes. Indeed, in 
these (a) no sharp concentration results are known, (b) the order of magnitude of the final number 
of edges is open for most strictly 2-balanced graphs, and (c) no general results apply to the class 
of 2-balanced graphs. As we shall see, when H is a matching the expected final number of edges 
in the reverse H-iree process is 0(1). When it comes to concentration we thus restrict our main 
attention to all other 2-balanced graphs H, which in fact satisfy d,2(H) > 1 (with equality for trees). 
Here our next result shows that the reverse H-iree process typically ends with ©(n 2 ^ 1 /^ 2 ^) edges, 
answering (up to constant factors) the aforementioned question of Bollobas and Erdos from 1990. 

Theorem 19. Let H be a 2-balanced graph. There are constants a, A > such that the final 
number of edges M n in the reverse H-free process has expectation satisfying an 2-1 ^ 2 ^ < EM„ < 
An 2-1 ^ 2 ^ . Furthermore, for any c > we have \M n — EM n | < <y/EM„(logn) 4eH with probability 
at least 1 — n~ c for n > no(c,H). 

Our arguments partially generalize to arbitrary graphs. Set ^2(^2) = 1/2 and 7712 (H) = 
maxccff,e G >i d2(G), so that 1712(H) = d,2(H) for 2-balanced graphs H. We show that for any 
graph the expected final number of edges in the reverse H-hee process is ©(n 2-1 /™ 2 ^), and prove 
concentration under certain conditions (satisfied e.g. by a clique K r with an extra edge hanging 
off), see Section [3] The proof of Theorem [T9l also extends to a finite family of forbidden graphs H, 
which for the H-hee process was considered in [31] . Indeed, defining the reverse 'H-free process in 
the obvious way (always removing a random edge that is contained in a copy of some H G H) we 
obtain, for example, the following generalization. 

Theorem 20. Let T-L = {Hi, . . . , H r } be a family of 2-balanced graphs. Define H, J £ H such that 
d2{H) = minF e ud2(F) and ej = maxpen e F ■ There are constants a, A > such that the final 
number of edges M n in the reverse TL-free process has expectation satisfying an 2- V d 2(ff) < EM n < 
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An 2 1 / d i( H ) . Furthermore, for any c > we have \M n — EM„| < ^/EA/„ (log n) 4e ■' with probability 
at least 1 — n~ c for n > no(c, Ji). 

1.3 Organization of the paper 

Section [5] is devoted to the proof of our new concentration inequalities, which are then illustrated 
by an application to the H-hee process in Section [3J 

2 Proofs of the concentration inequalities 

We start by proving two general martingale inequalities. These are applied in Section 12. 2[ where 
we establish our variants of the bounded differences inequality. 

2.1 Martingale inequalities 

Our concentration results are based on the following variants of Hoeffding-Azuma/Bernstein-type 
martingale inequalities. Since they are not stated exactly in this form in the literature, we give 
short proofs for the readers convenience (following the slick approach of Freedman [17]). In both we 
assume that (J r fe)o<ib<iV is an increasing sequence of cr-algebras, and (Mk)o<k<N is an (J-"fc)o<fc<Af- 
adapted bounded martingale. 

Lemma 21 ('Bounded differences martingale inequality'). Let L k and Uk be J-k-i-measurable 
variables satisfying Lk < M k — Mk-i < Uk- Set Sk — X^e[fc](^' — Li) 2 . For every t > and S > 
we have 

V{M k > M + 1 and S k < S for some k G [N]) < e~ 2 * 2/s . (19) 

Lemma 22 ('Bounded variances martingale inequality'). Let Uk be an Tk-\-measurable variable 
satisfying M k - M k -i < U k - Set C k = maxjg^ U t and V k = J2i£[k] Var(M, - Mj_i | Let 
4>(x) = (1 + x) log(l + x) — x. For every t > and V, C > we have 

P(M k > M + t,V k <V and C k < C for some k e [N]) < e ^/c 2 -Hct/v) < e -t*/(2V+2Ct/3) _ 

(20) 

Remark 23. Note that V k generalizes S k since Var(A/. ( - Af 4 _i | = E((Af 4 - Af 4 _i) 2 | J" 4 _i) 

holds (it is not hard to check that Vk < Sk/^)- In fact, Lemmas \21\ and\2^ extend with minor 
modifications to supermartingales: defining Vk = X)ie[fc] E((Afi — Mi-i) 2 | Ti— i) suffices. 

Observe that we allow for (accumulative) random bounds on the one-step changes (and other 
quantities), which in case of Lemma l21l is the main difference to the usual formulation of the classical 
Hocffding-Azuma inequality [50113]. Lemma |2"21 also extends the related Theorem 2.2.2 of Kim and 
Vu [23] (see also Lemma 3.1 in Vu's survey [43 J, which assumes that the underlying probability 
space is generated by independent random variables (of a special form) . 

Note that L k , Uk are ^-i-measurable, whereas M k — Mk~i is ^--measurable. This difference 
sometimes causes subtle off-by-one errors. As pointed out by Oliver Riordan, for e.g. the estimate 

P(A/jv > M +t)< e-* 2 / 2 ^ c * + rj (21) 
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it does not suffice if J^k ^(l-^fc — Mk-i I > c k) < V> as claimed by Theorem 8.4 in [T3]. The problem 
is that, conditional on J4_i, in the next step it sometimes is always possible for |M& — Mk~i\ < Ck 
to fail (although this might be unlikely). With this in mind, we see that (|2"T]) holds e.g. if 

J2 k p (it is possible, given M 1} M fe _i, that |M fe - M fe _x| > c fe ) < n. 

In fact, assuming that \Mk — Mk-i \ < Ck always holds, the approach of P~3][37] e.g. implies ([2"T]l if 

E fc (l + 2C fc /t) • P(|M fc - M fc _i| > c fc /4) < 77. 

2.1.1 Proof of Lemmas |2T1 and l22l 

Our proofs use the following (standard) inequalities due to Hoeffding [20] and Steiger [40]; they 
follow e.g. from the proofs of Lemmas 2.4, 2.6 and 2.8 in McDiarmid's survey [29] . 

Lemma 24. Let X be random variable with ¥,(X \ J-) = 0. Let L, U be J- -measurable random 
variables. Set g(x) = (e x — 1 — x)/x 2 for x 7^ and g(0) — 1/2. For any A > the following holds: 

L<X<U => E{e xx I F) < e >?( u - L ) a / a and (22) 

X<U => E(e AX I J - ) < e A29(AC/)Var(x| - F) . (23) 

Furthermore, g{x) is a non-negative increasing function. □ 

Lemma 25. Set <p(x) = (1 + x) log(l + x) - x. For all x > we /We > x 2 /(2 + 2x/3). □ 

Proof of Lemmas\M and\2^ Set 

W k = .9(AC/,) Var(M, - M«_i | Ti-x). 
ie[fc] 

The key point is that Mfc_i and 14, Sfc, are J4-i measurable. So, by applying (|22|) and 
(12^)) to E(e A ( Mfc - Mfc - 1 ) I J" fc _i) we see that 

y _ e A(M fc -M )-A 2 Sfc/8 and ^ = e A(M fc -M )-A 2 W fc 

satisfy E(14 | Th-x) < 54-i and E(z4 \ Fk-i) < ^fe-ii i- e -> are supermartingales. We define the 
stopping time T as the minimum of N and the smallest k G [iV] with Mk — Mo > t; as usual, we 
write i A T as shorthand for min{i,T}. By construction (ifeAT)o<fc<W and (Zk^T)o<k<N are both 
supermartingales. In particular, we have 

EY/vat < EY = 1 and EZjvat < EZ = 1. 

Let £n denote the event that Mk > Mq +t and Sk < S for some k € [iV]. Note that £at implies 
Y NAT = Y T > e xt ~ x2s / s . So, for A = At/S Markov's inequality gives 

F(£jv) < P(Y NAT > e xt - x2s / 8 ) < e x2s ' s - xt = e~ 2t ^ s ', 

which establishes (Tl9]) and thus Lemma [2T1 

We proceed similarly for (Zk/\r)o<k<N and let £' N denote the event that Mk > M n + 1, Vk < V 
and Ck < C for some k € [AT]. Using 14 > and monotonicity of g(x) > we see that £' N 
implies Zjvat > e xt ~ x2 s( - xc ^ Vk > e Ai-A 2 s(AC)v _ Reca n tna t 0(a-) = (1 + x)log(l + at) - a:. For 
A = log(l + Ct/V)/C 2 Markov's inequality and Lemma [25] now yield 

¥(£' ) < e A 2 5(AC)v-At = e -y/c 2 -0(ct/y) < g-tVpv+act/s) 

which establishes ([217]) and thus Lemma [H] □ 
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2.2 Bounded differences inequalities 

The textbook proof of Theorem[T]is based on the Hoeffding-Azuma inequality [20l[3], and essentially 
uses the 'worst case' Lipschitz condition ([T]) to apply Lemma I2T1 with \U k — L k \ < c k . We need 
some modifications to deal with the obstacle that the 'good' event T and thus the 'typical case' 
in Q does not always hold, and these are partially inspired by the seminal work of Shamir and 
Spencer [37] from 1987. 

When F(X ^ T) < w holds one might be tempted to add r\ to the error bound and then always 
assume that T holds. The problem is that in the martingale based proof one needs to estimate 
conditional expected changes as in ([3]). So, informally speaking, despite to £ T the 'good' event 
can still fail 'inside' the corresponding expectations. One can try to overcome this by conditioning 
on r, but this usually introduces a new technical problem: then the variables are not conditionally 
independent (in which case Lipschitz conditions comparable to ((4]) no longer suffice to bound the 
expected changes). These technicalities seem to cause some confusion in e.g. p~5 | [18 ] . 

We step aside these issues by noting that for good bounds on conditional expected one-step 
changes it suffices that the conditional probabilities of large changes are small. One key aspect of 
our approach is that we can always guarantee this via the 'global' event T only, i.e., without having 
any knowledge about the corresponding conditional distributions. 

2.2.1 The general approach 

We now introduce the setup used in all subsequent proofs. Let Y — f(X). We consider the 
increasing sequence of sub-tr-fields T k generated by X\, . . . ,X k . Using Doob's construction, the 
sequence Y k = E(Y | J-^) is a martingale with Yq = Ef(X) = fi and Ym = f(X). Now we define 
.Ffc-i-measurable events Bk-i, where u> £ Bk-i if 

F(X $ T | 7fc_i)(w) > 7 fc . (24) 

Let B = -^TU\J ke[N] B fe _i. Note that F(X £ T | Jb) = F{X £ V) yields P(B ) = if 71 > F(X ^ V) 
and ¥(B Q ) = 1 otherwise. Using 71 £ (0, 1] we infer P(^r U B Q ) < 7f X P(X £ T). Observing that 
F(X £ T) = E(P(X £ T I F k -i)) > 7fcP(S fe _i), the union bound now gives 

P(B) < P(-r U Bo) + W-i) ^ £ "fk 1 ■ F ( x i r )- ( 25 ) 

2<k<N ke[N] 

Let the stopping time T be the minimum of N and the smallest < k < N for which Bk holds (note 
that T < k — 1 is J^i-measurable). Setting = Y/c A t, it follows that the sequence (Mfc)o<fe<jv 
is a martingale with Yq = Mo = /i. Since T = N unless B holds, recalling Ym — f(X) we see that 

Hf{X) > /i + 1 and ->B) = F(Y N >Y +t and -*B) < F(M N > M + 1). (26) 

It remains to establish suitable tail estimates for P(A/jv > Mo + 1), and via Lemmas [2~T1 and 1221 this 
reduces to proving (deterministic) upper bounds on the random variables £/v, Vn and Cn- To this 
end we consider the martingale difference sequences AMjt = — Mk-i and AY/. = Yk~ ifc-i, 
which satisfy E(AM fe | F k -i) = and E(AY~ fc | 7" fc _i) = 0. Set e k = 7 fc (d fc - c k ) and A k = c k + e k . 

Proof of Theorem^ It suffices to show AM k £ [— A k , A k ] for each k £ [N]: then the claim follows 
by applying Lemma I2T1 with S = J2k£[N] (2Afe) 2 . The following argument is written with an eye on 
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the upcoming proofs (where some modifications are needed) . Note that AMj = 0ifT<fc — 1 and 
AM k = AY k if T > k. So it is enough to prove that |AYfc| < A k whenever T > k. For brevity, for 
z G A fc and y = (yk+i, ■ ■ ■ , Vn) G Yl k <j<N A j we write fv( z ) for f( x U X k -i,z, y k +i, • ■ • , j/jv). 
Note that 

E(f(X) | F k -x,X k =a)= f y {a)nX k +i=y k +u...,X N =y N \T k ^ 1 ,X k =a). 

Vk+lviVN 

Defining \AY k (a, b)\ via the next equation, since X±, . . . , Xn are independent it follows that 

\AY k {a,b)\ = \E(f(X) | F k -x,X k = a) - E(f(X) \ T k - X ,X k = b)\ 

< \fy(a)-fy(b)\nXk+i=y k+ i,-.-,X N =y N \T k - 1 ,X k = a). ( 27 ) 

Vk + 1,---,VN 

By distinguishing between X £ T and X (f. T, each time applying (|4]) as appropriate, we infer 

I AF fe (a, 6)| < c k F(X e T | F k -i,X k = a) + d k ¥(X £ T \ ? k _ u X k = a) 

= c k + (d k -c k )F{X <£T\ F k _ u X k = a). ( } 

Recall that £ fc _i fails if T > k. Using J2 Vk ^(X k = y k \ Fk-i) = 1 together with (@5J and (EU, for 
T > k we deduce 

|AY fe | = \E(f(X) I F k _ x ) -E(f(X) | F k )\ < Y \AY k (y kl X k )\ ¥(X k = y k \ F k ^) 

y k (29) 
<c k + (d k - c k )P(X £ T | T k -x) <c k + lk (d k - c fc ) = A fe . 

As explained, this completes the proof. □ 

Here we could have used the classical Hoeffding-Azuma inequality 20 , 3 since the proof yields 
(deterministic) bounds for each individual AM k . We decided to apply Lemma |2"T1 since the forth- 
coming modifications needed for the 'dynamic exposure' of Section 11.1.31 do use its full strength, 
i.e., that accumulative estimates of the AM k suffice. 

Proof of Remark^ Following the approach of McDiarmid [25] we now modify the proof of The- 
orem [2] whenever X k takes only two values, say, A k = {0, 1}. We focus on the relevant case 
T > fc, where AM k = AY k . Define L k and U k as the minimum and maximum of E(f(X) \ 
T k -\,X k = z) — E(f(X) | F k -i) for z G {0, 1}. Clearly L k and U k are F k -\ -measurable and satisfy 
L k < AY k < U k . The key observation is that, using T > k, there exists an J-fc_i-mcasurable 
a G {0, 1} satisfying 

7fc > Hx t r | J- fc -i) > P(X £ r | F k -x,X k = a). (30) 
So, since X k 6 {0,1} takes only two values, using (|2"5)l and (|3T)1) we infer for T > k that 

\U k ~ L k \ < \AY k (a, 1 - a)\ < c k + (d k - c k )P(X £ T \ F k -i,X k = a) < A k . (31) 
This completes the proof (by applying Lemma |2"T1 with S = J2 k <£[N] ^k)- ^ 
In fact, Theorem [1] follows by a similar modification (here ([28)) implies max aj b \AY k (a, b)\ < c k ). 
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Proof of Theorem^ In the proof of Theorem [5] we established AM k < A k for every k € [N]. In 
view of this it suffices to show Var(AMfc | .Ffc-i) < (1 — p k )p k Al for each k € [iV]: then the claim 
follows by applying Lemma [22] with V = Y^k&\N]0- ~ Pk)pkA 2 and C = max^jv] A k . Observe 
that E(AM fe | J 7 ^) = implies Vax(AM fc | = E{AM 2 k \ F k -i). Recall that AM k = if 

T < fe — 1 and AM k = AY k if T > fc. Combining these facts it is enough to prove that E(AY^ | 
.Fk-i) < (1 -p k )PkA 2 k whenever T > k. Set £ fc = E(Y | F k - U X k = 1) - E(Y | F k - U X k = 0). 
Recalling AY fc = E(Y | - E(Y | F k -i) we see that 

|AY fe |<|£» fe | J2 nXk=P\Fk-i)Mx k =i-0}- (32) 
/3e{o,i} 

Arguing as in ([30]) and pip we readily obtain \D k \ < A k when T > k, and thus infer 

AY 2 <A 2 HXk = l3\F k -ifl { x h =i-e } - 

/3e{o,i} 

Using the independence of X± , . . . , Ajv it follows that for T > k we have 

E(AY 2 k | < A^ £ P(X fc =/3) 2 P(X fc = l-/3) = (l- Pfc ) Pfc A2, (33) 

/3e{o,i} 

where we used P(Afc = 1) = p k and (1 — x) 2 x + x 2 (l — x) — (1 — a;)x for the last inequality. □ 

Note that (|32p implies AM k < max{l — p k ,Pk} ■ A&, but the resulting minor improvement of C 
usually has negligible effect. 

Proof of Remark^ In the more general situation where each X k takes values in a set A k and 
satisfies max^ £ A t F(X k = rf) > 1 — pt, we first show that ([7} holds after deleting (1 — p k ) and 
replacing + e k with A& = c k + e k ■ (1 — pk)^ 1 ■ With the proof of Theorem [4] in mind it suffices 
to show Var(AY fc | F k -i) < p k A\ whenever T > k. For (3 G A k satisfying P(X k = (3) > 1 - p k 
set r k — E(Y | F k ~\,X k = (3) and Dfc = E(Y | F k ) — r k . Note that, using the independence of 
X\, . . . ,Xn, we have P(D k ^ | J- k -i) < P(Afc ^ (3) < p k . We claim that it suffices to show 
< Afe. Indeed, since Y k -\ and are T k -\ measurable, we have 

Var(AY fe | F k _ x ) = Vzr(D k \ F k _ x ) < E(D 2 \ F k ^) < A 2 ¥(D k ? | F k _{) < Vk A\. 

To bound \D k \, first note that T > k and independence of X%, . . . ,Xn yields 

7fe > P(X g T | F k -t) > F(X g T | T k - U X k = /3)(1 -p k ). (34) 

So, using flU) and O, for T > fc we infer 

|£>*| = |E(y| J-fe_ 1) X fe = /3)-E(Y| J- fe )| = |AY fe (/3,X fc )| 

(35) 

< c fc + (4 - c k )P(X # T I J-fe-i, X fc = /3) < c fc + 7fe (l - pfe)- 1 ■ (d k - cfc) - A fc , 
establishing the claim. 

A similar argument shows that ([9]) holds after deleting (1 —p k ). The point is that in Corollary[6] 
there is no 'good' event T. Consequently, when invoking (|2"51) in (|35p the standard line of reasoning 
(using ([I]) instead of ([J}) yields \D k \ < c k , and the claim follows. □ 
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Proof of Theorem^ The crux is that (|28l) and (|29l) are at the heart of all previous proofs. In 
the following we exploit that both can be adapted when (jlOl) instead of ((4]) holds: it suffices if 
ek = 2^k(dk - c/c)?^ 1 is used, where min^ e A fc P(^"fe = rj) > q k . 

We start by modifying the proof of Theorem [2j Analogous to (|34|) , if T > k then for all b E Ak 
we have 

7?= > Ppr ^ r | -F fc -i) > p(x ^ r | - % fe . 

The key point of (fT0|) is that \f(x) — f(x)\ < Ck only holds if x, x € T. So, using (flOl) as appropriate, 
the corresponding variant of ((25)) for T > fc is 

|Ay fc (o > 6)|<cfc + (dfc-c fc )pp(A-^r| jfc_i,x fc = o) + p(A:^r|7-fc_ 1> Xfc = 6)] / 

n (36) 

< c k + (d k - c k ) [P(X r I Tk-i,X k = a) + lk q,: ] ■ 

Now, arguing as in ([29| and using 1 + q^ 1 < 2g A T 1 , we obtain a natural analogue for T > k, namely 

|AY fc | < ck + {dk - Ck) [P{X £ r | Jfc-i) + lk ql l ] < c k + 2 lk {d k - c k )q^ = A k , (37) 

which establishes the claimed variant of Theorem [2l 

In the proofs of Remark |3] and Theorem|4]we only need to adapt (|31|) . and using (|36| this follows 
by straightforward modifications. Similarly, in the proof of Remark [8] it suffices to modify (|35]), 
which is standard using (|3"7) together with (1 — Pk)^ 1 + q^ 1 < 2q A 7 1 (l — Pk)" 1 - □ 



2.2.2 Some extensions 

Proof of Remark^ Note that f(X) > fj, + 1 is increasing (decreasing) if f(X) is increasing (de- 
creasing). Furthermore, in view of (IM1) it is easy to check that Bk-i is increasing (decreasing) if T 
is decreasing (increasing). Using the definition B and the assumptions of Remark [SJ it follows that 
/(X) > /j, + 1 and ->S are either both increasing or decreasing. So Harris' inequality [IS] yields 

P(/P0 > (j, + 1 and -.£?) > P(/(X) > M + *) • P(-"B), 

which readily establishes (J5J. □ 

Proof of Theorem The basic idea is to use a truncation that maps every Xk (fc T k to some fixed 
Zk € Tfe. As before, we work with the sub-a-fields T k generated by X\, . . . , X k . Recall that B is 
defined via ([24]) and satisfies ^B C I\ Since P(/(A) > /z + 1 and -iB) < P(A e T) we may assume 
that ¥(X e r) > and fix some z = (z\, . . . , zjv) eTC rijerjv] ^ or 21 = fai' ■ ■ • > ^Jv) we now 
define x* = (x*, . . . , x^) via 

x* k = { Xk liXkeTk > (38) 
J^Zfe if x k T k . 

The key properties of this construction are (a) that x G T implies x* = x, and (b) that X* — 
(X*, . . . , X^) is a family of independent random variables. Set /i* = E/(A*). We have \fi-fi*\< 
E\f(X) - f (X*)\ < sP(X £ T) = A (this is not best possible but keeps the formulas simple), and 
so T C ^B yields 

P(/(A) > n + t + A and -J3) < P(/(X*) >^*+t and -.B). (39) 
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Now we estimate the right hand side of ([55)1 for Y — f(X*) as in the proof of Theorem [5] via (|2l?|). 
and there are only two minor differences. The first is that due to the projection ([38]) we have 
X* e rijefAri Tj, so that the 'refined' Lipschitz coefficients Ck and <4 of Theorem fTTI always apply. 
The second concerns the case distinction X* £ T and X* £ T. Here we use that X* £ T implies 
X T pointwise, which yields ¥{X* £ V \ F k -i) < F(X £ T \ F k -i)- With this estimate the 
conclusion of ()29|) readily carries over, completing the proof. □ 

Here we could have estimated f(X*)>fi*+t via Theorem [5] (with replaced by Tk), using 
that P(X* £ r) < ¥(X £ r). The advantage of our more pedestrian approach is that it uses the 
same 'bad' event B as all other proofs (by applying Theorem [2] it would depend on z). 

Proof of Remark WA The monotonicity property implies /i > /i* (writing f(X) — f(X*) as a dif- 
ference sequence of coordinate changes), so A = suffices using /i + t > fi* + t in (j39|) . Turn- 
ing to the special case T = Ilje[jv] ^V, n °t e that B = -F suffices to establish ([39| . Now, since 
X* — (Xi, . . . ,X]^) is a family of independent random variables with X£ £ Tk satisfying (L), the 
claimed variant readily follows from Theorem [T] □ 

2.2.3 Variants using dynamic exposure 

In the following we briefly sketch how to modify the proofs of Sections 12.2.11 and 12.2.21 in case the 
variables are exposed in a dynamic order, which will eventually establish Theorem [T3] and [TJ] (in 
contrast to [5] our approach is based on general martingale inequalities) . Recall that the strategies 
introduced in Section 11.1 .31 sequentially expose X qi ,X q2 , . . . with = qi(X qil . . . ,X qi _ 1 ), where 
f(X) is determined by {X\, . . . ,X qk ) with k < N if qt+i = qk- For technical reasons we slightly 
modify these strategies so that (always) all variables are queried. More precisely, for the proof of 
Theorem [T3l we set qk = qk until f(X) is determined by (X\, . . . ,X qk ); afterwards ■ • • , ?jv 

equals the remaining 'useless' indices [N] \ {qi, . . . 7 qk} in ascending order, say (for the proof of 
Theorem [14] we simply use the fixed order qu = k for all k £ [N] ) . We consider an increasing 
sequence of sub-cr-fields, where Tk is generated by X qi , . . . , X qk . Note that each index q k is Tk-\- 
measurable. Furthermore, our modification ensures that the following two key properties hold: X is 
.F/v-measurable (this is needed to apply T since the value of f(X) must not uniquely determine X), 
and conditional on J-k-i all Xj with j £ {qi, . . . ,qk-i} are independent random variables. Define 
R = maxQgg |Q| and B = -F U Ufce[ffl ( m case °f the fixed order q~k = k we set R — N,soB 

remains unchanged). Since the definition of B^-i via (|24[) involves J-^-i, it follows that B depends 
on the query strategy (unless the fixed order q^. = k is used, as in the proof of Theorem I14p . 

With these changes in mind, all arguments of Sections 12.2.11 and 12.2.21 essentially carry over 
word by word, the only exception being the proof of Remark [5] (the monotonicity argument needs 
a fixed order such as qk — k). The crucial observation is that for every variable Xg not queried 
by the original strategy we know that its value will not change the outcome of f(X). To be 
more formal, the key point is that whenever such a 'useless' variable is queried in step i we have 
AMj = (note that for i > R this is always the case), i.e., the indices of these variables do not 
contribute to Sn, Vn or Cat. Observe that due to the dynamic exposure we 'only' have a connection 
between jk and the index qk- We overcome this minor complication using the assumption that 
7^ = 7 for all k £ [N] (we may allow for different jk if qk = k is used), which also ensures that 
maxQgg SfceQ 7/7* = Efce[_R] Ik. 1 holds (in case of qk = k the estimate (|25[) stays unchanged). 

The remaining details for establishing Theorem [T3] and [H] are rather straightforward: when 
invoking the martingale estimates we simply take the 'worst case' bounds for S, V and C over all 
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possible sets of queried indices Q £ Q (where Q and Q are as defined in Section [T.1.3|) : for example, 
using S = maxQgg J^keQ^Ak) 2 in case of Theorem[2j It is this last step where the accumulative 
random bounds in Lemmas [21] and [22] are crucial (the behaviour of each individual AM k may 
vary significantly for different sample points due to the dynamic order in which the variables are 
queried) . 

2.2.4 Variants using the general Lipschitz condition 

Finally, we discuss how to modify the proofs in Section 12.2.11 when the independence assumption 
is replaced by (GL). We first claim that pk — p k (S a , £&) : E a —¥ E& is a bijection with equality in 
(fl"3j) . i.e., satisfies 

¥{X = x | X £ E„) = P(X = p k (x) | X £ E b ). (40) 
Indeed, using (II 3[) and that pk is injective it follows that 

= x \x £ e„) < J2 p ( x = pk( x ) i x e s&) < 51 p ( x = 1 1 x e 

i£E„ i£E„ xeS 6 

Noting that X^es — x | X £ E z ) = 1 for z g {a, b} with |E Z | > we infer that all inequalities 

are in fact equalities, which establishes (j40|) . Since every x £ E z satisfies P(X = x) > it also 
follows that pk must be a bijection, as claimed. 

In preparation of our forthcoming arguments we now relate the definitions used in the proofs 
of Section [2.2.11 with those occurring in (GL). Analogous to E z , given any possible sequence of 
outcomes oi, . . . , a k -i of Xi, . . . , X k -i define E as the set of all x = (oi, . . . , afc-i, x k , ■ ■ ■ , xn) £ 
rij£[jv] Ai w ^ n — x) > 0. Recall that F k is the increasing sequence of sub-cr-fields generated 
by Xi,... ,Xk- The key point is that (J r fc)o<fc<Af naturally corresponds to an increasing sequence 
of partitions of the sample space, where two points belong to the same part if and only if they agree 
on the first k coordinates. For example, for weSwe have 

E(- | F k -i,X k = z)(uj) = E(> | X 6 £ z ) and P(- | F k _ u X k = z){u) = ¥{■ | X £ E,). (41) 

Proof of Theorem [731 We modify the proof of Theorem [2l where independence is only used to 
establish (|27[l . Using (|41[) and that the bijection p k : E a — > E& satisfies ([40]) . we obtain 

\AY k (a, 6)| = |E(/(X) | X £ E„) - E(/(X) | X e E b )| 

= | ^ /(x)P(X = x | X e E„) - ]T f(*Mx = x I X £ E 6 )| 

< E - f(pk(xW(X = x\X £ E a ), 

which is the natural analogue of (|27[) . The remainder of the argument carries over with minor 
modifications. Indeed, proceeding as in (|28p (applying (fT2"j) instead of (U|)) and then appealing to 
HH), we infer 

\AY k (a,b)\ <c k + (d k -c k )P(X £T\X £ F k _ x ,X k = a). (42) 
Now, by arguing as in (|29l) . when T > k holds we also have 

|Ay fc | < c fe + {d k - c k )P(X £'T\X£ Jit,!) < A fe , (43) 

completing the proof. □ 
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Remark 1 1 61 follows by similar reasoning (noting that the proof of Remark [3] carries over and that 
([42]) equals (H"4l) after replacing (dk — c&) with /■&). 

Proof of TheoremtT^ We modify the proof of Remark[5]by picking (some) J"fc_i-measurable /3 € A& 
maximizing P(Afc = /? | .Ffc_i). By assumption we have P(A/j = /3 \ Fk-i) > 1 — Pfcj which in turn 
yields P(Z?a, =^ | J-k-i) < P(^fe 7^ /3 | Fk—\) < Pfc- Noting that all remaining applications of 
independence are already covered by (j42|) and (|43| . this completes the proof. □ 

Proof of Theorem \18[ With the above modifications in mind the proof of Theorem [9] carries over 
word by word, which establishes the first part of the claim. Turning to the second part, our earlier 
discussion shows that for Yi Zl H ri C S with |£ z |, |S r) | > there is a bijection pk : £ z — > T, n . So, 
since all possible outcomes occur with the same probability, we obtain ¥(X G £ z ) = P(X £ £,,). 
For 77 g A/- satisfying IE,,! > it follows that 

1 = nx e g) = v Ppr £ Sg) 

P(X fe = ry|XeS) P(X6E„) 2 ^P(leg-' 

We deduce that mm vG A k Ppf* = r\ \ X\, . . . ,Xk~i) > |Afe| _1 , so qt < |Afe| _1 suffices. □ 



3 Final number of edges in the reverse H-free process 

In our analysis of the reverse H-free process we use several equivalent definitions (with respect to 
the final graph). Recall that, starting with the complete graph on vertex set [n], in each step an 
edge is removed, chosen uniformly at random from all edges contained in a copy of H. As in [161127] . 
a moment's thought reveals that we may instead traverse all edges in random order, each time 
removing the current edge if and only if it is contained in a copy of H in the evolving graph. As 
observed by Erdos, Suen and Winkler |16) . after considering e^ , . . . , ej + i the decision whether is 

removed depends only on the later edges ej_i, . . . , e\ (all other 'surviving' ones are by construction 
not contained in a copy of H). This allows us to consider the edges in reverse order, where ej is 
added if and only if it does not complete a copy of H together with e\, . . . , ei-\ (it does not matter 
whether these were added or not). Given a random permutation, we denote the corresponding 
random graph process after i steps by G n ^(H) C G n ^ 7 where G n ^ is the uniform random graph 
with n vertices and i edges. 

For technical reasons it will be convenient to also consider a continuous variant of the above 
process, where each edge is independently assigned a uniform birth time B e € [0, 1]; the edges are 
then traversed in ascending order of their birth times (which are all distinct with probability one). 
The resulting process that considers only those edges with B e < p is denoted by G n , P (H) C G n>p . 
So for p = 1 all edges are traversed in random order, and it follows that 

G^H) = G nA (H). (44) 

Conditioned on B e — q, the decision whether e is added only depends on the edges / with Bf < q, 
which have the same distribution as G n ^ q . As noted by Makai [37], this allows for the use of classical 
random graph theory when estimating the probability that an edge is added to the evolving graph. 
Recall that rri2(H) = d,2(H) for 2-balanced graphs H. For 

m = n 2-l/m 2 (ff)( logrl )2 and p = n -l/m2(ff)( logrl )2 
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the next lemma follows from the results of Spencer [35] mentioned in Section 11.2.21 Note that in 
Gn,p every pair of vertices is expected to have 9((logn) 2 ( eff_1 )) 'extensions' to copies of H. 

Lemma 26. Let H be a 2-balanced graph. LetT> (and X ) denote the event that for every pair xy of 
vertices the following holds: after adding the edge xy there are at most ^ h — (\ogn) 2eH copies (is 
at least one copy) of H containing the edge xy. For every c > we have P(G n , m € VC\I) > l — n~ c 
for n > na(c, H) . □ 

The point is that whenever I holds no further edges are added. This allows us to couple both 
variants of the reverse H-iree process such that they agree with very high probability after consid- 
ering only m edges. So for our purposes they are interchangeable, and we obtain the corresponding 
formal statement by combining Lemma [26] with (|44p . 

Lemma 27. Let H be a 2-balanced graph. There is a coupling such that for every c > we have 

G n ,m(H) = G n jn\(H) — G n ,i{H) 

with probability at least 1 — n~ c for n > no(c, H). □ 

Turning to the number of edges in G n . m (H), which we denote by e(G n>m (£/")), recall that each 
Ci is added if and only if it does not complete a copy of H together with e\, . . . ,e%-x. So one 
edge can, in the worst case, influence the decisions of up to 0(min{m, n VH ~ 2 }) edges (whether 
they are added or not); however, on the 'typical' event V of Lemma [26] this is limited to at most 
&h ■ "fff = 0((logn) 2e,r ) edges. For this reason the standard bounded differences inequality fails to 
give useful bounds (due to large worst case C&), whereas a routine application of the typical bounded 
differences inequality yields sharp concentration, illustrating its ease of use and effectiveness. 

Theorem 28. Let H be a 2-balanced graph. For every c > and n > no(c,H) we have 

P(|e(G„, m (iT)) -Ee(G n , m (H))\ > V^(logn) 3e ") < n~ c . (45) 

Proof. Lemma implies that G n ^ m € V holds with probability at least 1 — n~( 2c+6 ). Note that 
the random sequence of edges e = (ei, . . . , e m ) corresponds to the (uniform) random graph process 
(G n> i)o<i< m and uniquely determines /(e) = e(G n , m (H)). The crucial observation is that whenever 
G„ )m ,G n , m € T> have edge sequences e, e that differ only in one edge (i.e., ej ^ e-j) or the order 
of two edges (i.e., e,- = e^ and = e^), then our earlier observations imply \e(G n . m {H)) — 
e(G„ jm (-ff))| < 2e#(logn) 2eff = A. The point is that by the discussion of Section [1.1.41 this is 
exactly the condition that needs to be checked in order to apply Theorem [15] (with the two-sided 
Lipschitz condition (TTB1) of Theorem ITS]) using N — to, the 'good' event V = V, Lipschitz coefficients 
Cfe = A, dk = n 2 and the 'two-sided parameter' qk = n~ 2 . For the 'compensation factor' j). = n~ A 
we have eu < 2jkdkQ^ 1 < 2, Ck + &k = 0((log?i) 2eH ) and X^T^ 1 — " 6 - usm g © an d © we 
deduce that the left hand side of ([45]) is at most e -°(( lo s") 2cff ) + n - 2c < n" c . □ 

To establish Theorem [19] it remains to bound the expected final number of edges up to constant 
factors. Our argument is inspired by Makai [27], who proved asymptomatically matching bounds 
in ([4"6"f for the class of strictly 2-balanced graphs (the case H — K3 is due to Erdos, Suen and 
Winkler [16]). In fact, here we determine the correct order of magnitude for all graphs. 
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Theorem 29. Let H be a graph with en > 1. There are a, A > such that 

[an 2 - 1/m2{H) \ < Ee(G„,i(H)) < An 2 - 1/m2{H) (46) 
for n > no(H), where the floor function is only needed when en = 1. 

Proo/. When e H = 1 we have Ee(G nj i(iZ)) = and m 2 {H) = 1/2, so (gBJ) holds with, say, 
a = A = 1/2. Henceforth we assume en > 2. Define Z e = Z e (H) as the event that the edge e is 
contained G n .\(H). Let Y ei u.q count the number of copies of H in G n ^ q U e (the graph obtained by 
inserting e into G n ^ q if it is not already present). Recall that, conditioned on B e = q, only edges / 
with Bf < q are relevant for Z e ; so e is added if and only i£Y e< H q — 0. Hence for q £ [0, 1] we have 

P(Z e | £ e =q)= V(Y etH!q = 0). (47) 

For the lower bound in (|4"6"]l fix F C iJ with ^2 (Z 1 ) = f»2 (i?) that satisfies > 2 (this choice is 
possible as en > 2). Given e there are at most Dri" F ~ 2 extensions to F for some D — D(F) > 0, 
so whenever q < n — 1 / m a(- H ') holds monotonicity and Harris' inequality jTO] yield 

P(Y"e,H, 9 = 0) > W(Y eiFiq = 0) > (1 - q ^^)Dn^- 2 > e -2Z3«^- 2 g^-i > e _ 2D ^ 

Together with 07} we obtain P(Z e ) = E P(Z e | B e = (?) > rj- 1 /"^) • er 2D , and the lower bound 
in (|46p now follows by linearity of expectation. 

Turning to the upper bound in (j46|) . consider q = \n^ 1 / m ^ H ' with 1 < A < n 1 /™ 12 ^'. We apply 
Janson's inequality to Y e} H,q, which counts the number of extensions of e to H (viewed as subgraphs 
these do not contain the edge e). Note that en > 2, A > 1 and rri2(H) > (e# — 1)/(vh — 2) imply 

fj, = EY e<H , q = Q(n VH - 2 q eH - 1 ) = n( wr - 2 )-( eH - 1 )/ m2 We(A e * _1 ) = S1(A). 

Define Q as the set of all proper subgraphs graphs G C H with &q > 2. Considering all possible 
'overlaps' of extensions of e to H (analogous to the textbook proof of the small subgraphs theorem), 
the A term of Janson's inequality satisfies 

A < 0{n VH ~ 2 q eH - 1 ) ■ 0(n VH - VG q eH - eG ) = 0(fi 2 ) ™~ (wg ~ 2) q'^ ^ 
ceg ceg 
= 0( M 2 ) n-IM-M/^miA-l*-!) = 0( M 2 /A), 
Gee 

where the last inequality follows from cq > 2, A > 1 and m,2{H) > (e<3 — l)/(va — 2). So, using 
H/X = we infer fx + A = 0{^ 2 /X) and thus /i 2 /(^ + 2A) = Q(X) = VL(n l l m2{H) q). Applying 
Janson's inequality (see e.g. Theorem 2.18 in [35]) we have F(Y e H, q — 0) < exp (-Cn 1 /" 12 ^?) for 
C = C(H) > 0. Combining this with g7]) when g > n -^/^(H) and the trivial bound P(Z e | B e = 
q) < 1 otherwise, for ^4 = 1 + e~ c jC we obtain 

P(Z e ) = E P(Z e | S e = g) < n - 1 /™2(H) + f exp (_ Cn i/™2(ff) g ) dg < An' 1 '" 12 ^. (49) 

J n -l/m 2 (H) \ J 

Linearity of expectation now yields the upper bound in (|46[) . □ 
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Our arguments partially generalize to arbitrary graphs, which we shall now briefly discuss. In 
this case Lemma [26l remains true if we modify T> to at most, say, ^ h = (\ogn)ri" H ~ 2 p eH ~ 1 copies, 
and so the coupling of Lemma l27l carries over (it only uses I). With ([4*4*jl in mind, Theorem l29l shows 
that the expected final number of edges is jj, = ©(n. 2-1 /* 713 ^). Adjusting the proof of Theorem l28l 
with Cfc = 2en^ h, a short calculation shows that we obtain concentration on an interval of length 
[in~ J with 7 = j(H) > whenever 

v H > 4 and m 2 {H) < (2e H - 3)/{2v H - 6) or u H = 3 and e H > 2. (50) 

Perhaps surprisingly, this condition is satisfied by standard examples of 'unbalanced' graphs such 
as a clique K r with an extra edge hanging off. 

The proofs in this section also extend with minor modifications to the more general reverse 
H-free process considered in Theorem In this case the 'inverted' processes Gn,m(H) and 
Gn,p(H) are defined in analogous ways, where an edge is added only when it closes no copy of 
some F £ H. We need to modify T> of Lemma [26] so that for all F £ H it ensures at most 
fyp = max{(logn)n , ' F-2 p ej; ' -1 , (logn) 2 } copies, whereas the corresponding I only applies to the 
distinguished graph H with rri2(H) — d,2{H) — min^g-^ d,2{F). As before, once T holds no more 
edges are added. With this in mind the coupling of Lemma [27] as well as the concentration result of 
Theorem[55] carry over in a straightforward way (noting that di (H) < di (F) implies V^f < (log n) 2eF 
for all F £ T-L). Turning to the expected final number of edges, for the lower bound of Theorem [29l 
we avoid all F £ % simultaneously. The resulting modification of (|48]l works for q < n" 1 /™ 2 ^ 
since d,2(H) < d,2(F) implies n VF ~ 2 q eF ~ 1 < 1. For the upper bound it suffices to just avoid the 
distinguished 2-balanced graph H, so we may reuse the estimates of (|49|) to establish Theorem [20l 

Finally, note that every edge added by G nym (H) is also added by the H-free process defined 
in Section 11.2.31 (where is added if and only if it does not complete a copy of H together with 
the added edges among e±, . . . , It follows from Theorem 1291 that the expected final number 

of edges in the H-hee process is at least f^n 2-1 /" 12 ^) for any graph H, which improves the 
f2(n 2 ~ 1 / d2 ^- ) ) bound resulting from the deletion argument of Osthus and Taraz [21]. In fact, if the 
technical conditions in (|50p are satisfied our earlier discussion implies that this lower bound also 
holds with probability tending to one (not only in expectation), which for 'unbalanced' graphs with 
m,2(H) > d2(H) does not follow from Theorem 1 in |31) . 
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